Goto

Collaborating Authors

 potential well


How to explain grokking

arXiv.org Artificial Intelligence

Simple ideas of thermodynamics and kinetic theory allow us to explain better generalization observed for learning by the stochastic gradient optimization procedure, see also [7] (where also overfitting control for GAN model was discussed). We also have explained the grokking(delayed generalization) phenomenon and some properties of grokking observed in [8].


Control of Overfitting with Physics

arXiv.org Machine Learning

Analogies from physics and other fields, particularly population genetics, are of interest when studying problems in machine learning theory. Analogies between machine learning theory and Darwinian evolution theory were discussed already by Alan Turing [1]. Biological analogies in computing were discussed by John von Neumann [2]. Physical models in relation to computing were discussed by Yuri Manin [3]. Such analogies allow physical intuition to be used in learning theory. Among the well-known examples are genetic [4] and evolutionary algorithms [5], models of neural networks and physical systems with emergent collective computational abilities and contentaddressable memory [6], a parallel search learning method based on statistical mechanics and Boltzmann machines that mimic Ising spin chains [7]. A phenomenological model of population genetics, the Lotka-Volterra model with mutations, related to generative adversarial network (GAN) was introduced in [8]. Analogies between evolution operator in physics and transformers (an artificial intelligence model) were discussed in [9]. Ideas of thermodynamics in application to learning were considered in [10,11] and in relation to the evolution theory in [12,13].


Quantum-Inspired Neural Network Model of Optical Illusions

arXiv.org Artificial Intelligence

Ambiguous optical illusions have been a paradigmatic object of fascination, research and inspiration in arts, psychology and video games. However, accurate computational models of perception of ambiguous figures have been elusive. In this paper, we design and train a deep neural network model to simulate the human's perception of the Necker cube, an ambiguous drawing with several alternating possible interpretations. Defining the weights of the neural network connection using a quantum generator of truly random numbers, in agreement with the emerging concepts of quantum artificial intelligence and quantum cognition we reveal that the actual perceptual state of the Necker cube is a qubit-like superposition of the two fundamental perceptual states predicted by classical theories. Our results will find applications in video games and virtual reality systems employed for training of astronauts and operators of unmanned aerial vehicles. They will also be useful for researchers working in the fields of machine learning and vision, psychology of perception and quantum-mechanical models of human mind and decision-making.


Reaction coordinate flows for model reduction of molecular kinetics

arXiv.org Machine Learning

In this work, we introduce a flow based machine learning approach, called reaction coordinate (RC) flow, for discovery of low-dimensional kinetic models of molecular systems. The RC flow utilizes a normalizing flow to design the coordinate transformation and a Brownian dynamics model to approximate the kinetics of RC, where all model parameters can be estimated in a data-driven manner. In contrast to existing model reduction methods for molecular kinetics, RC flow offers a trainable and tractable model of reduced kinetics in continuous time and space due to the invertibility of the normalizing flow. Furthermore, the Brownian dynamics-based reduced kinetic model investigated in this work yields a readily discernible representation of metastable states within the phase space of the molecular system. Numerical experiments demonstrate how effectively the proposed method discovers interpretable and accurate low-dimensional representations of given full-state kinetics from simulations.


Fermion Sampling Made More Efficient

arXiv.org Artificial Intelligence

Fermion sampling is to generate probability distribution of a many-body Slater-determinant wavefunction, which is termed "determinantal point process" in statistical analysis. For its inherently-embedded Pauli exclusion principle, its application reaches beyond simulating fermionic quantum many-body physics to constructing machine learning models for diversified datasets. Here we propose a fermion sampling algorithm, which has a polynomial time-complexity -- quadratic in the fermion number and linear in the system size. This algorithm is about 100% more efficient in computation time than the best known algorithms. In sampling the corresponding marginal distribution, our algorithm has a more drastic improvement, achieving a scaling advantage. We demonstrate its power on several test applications, including sampling fermions in a many-body system and a machine learning task of text summarization, and confirm its improved computation efficiency over other methods by counting floating-point operations.


Support vector machines for learning reactive islands

arXiv.org Artificial Intelligence

We develop a machine learning framework that can be applied to data sets derived from the trajectories of Hamilton's equations. The goal is to learn the phase space structures that play the governing role for phase space transport relevant to particular applications. Our focus is on learning reactive islands in two degrees-of-freedom Hamiltonian systems. Reactive islands are constructed from the stable and unstable manifolds of unstable periodic orbits and play the role of quantifying transition dynamics. We show that support vector machines (SVM) is an appropriate machine learning framework for this purpose as it provides an approach for finding the boundaries between qualitatively distinct dynamical behaviors, which is in the spirit of the phase space transport framework. We show how our method allows us to find reactive islands directly in the sense that we do not have to first compute unstable periodic orbits and their stable and unstable manifolds. We apply our approach to the H\'enon-Heiles Hamiltonian system, which is a benchmark system in the dynamical systems community. We discuss different sampling and learning approaches and their advantages and disadvantages.


Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function

arXiv.org Machine Learning

Optimization is a central ingredient of machine learning. First-order optimization algorithms, for instance, are particularly popular for deep learning tasks due to their scalabilities to highdimensional problems, because they employ gradient but not higher-order information of objective functions for iteratively approximating minimizers. Among first-order methods, arguably the most used is gradient descent method (GD), or rather one of its variants, stochastic gradient descent method (SGD). Designed for objective functions that sum a large amount of terms, which for instance can originate from big data, SGD introduces a randomization mechanism of gradient subsampling to improve the scalability of GD (e.g., Zhang [2004], Moulines and Bach [2011], Roux et al. [2012]). Consequently, the iteration of SGD, unlike GD, is not deterministic even when it is started at a fixed initial condition.


Machine learning and serving of discrete field theories -- when artificial intelligence meets the discrete universe

arXiv.org Artificial Intelligence

A method for machine learning and serving of discrete field theories in physics is developed. The learning algorithm trains a discrete field theory from a set of observational data on a spacetime lattice, and the serving algorithm uses the learned discrete field theory to predict new observations of the field for new boundary and initial conditions. The approach to learn discrete field theories overcomes the difficulties associated with learning continuous theories by artificial intelligence. The serving algorithm of discrete field theories belongs to the family of structure-preserving geometric algorithms, which have been proven to be superior to the conventional algorithms based on discretization of differential equations. The effectiveness of the method and algorithms developed is demonstrated using the examples of nonlinear oscillations and the Kepler problem. In particular, the learning algorithm learns a discrete field theory from a set of data of planetary orbits similar to what Kepler inherited from Tycho Brahe in 1601, and the serving algorithm correctly predicts other planetary orbits, including parabolic and hyperbolic escaping orbits, of the solar system without learning or knowing Newton's laws of motion and universal gravitation. The proposed algorithms are also applicable when effects of special relativity and general relativity are important. The illustrated advantages of discrete field theories relative to continuous theories in terms of machine learning compatibility are consistent with Bostrom's simulation hypothesis.


Optimal Binary Classifier Aggregation for General Losses

Neural Information Processing Systems

We address the problem of aggregating an ensemble of predictors with known loss bounds in a semi-supervised binary classification setting, to minimize prediction loss incurred on the unlabeled data. We find the minimax optimal predictions for a very general class of loss functions including all convex and many non-convex losses, extending a recent analysis of the problem for misclassification error. The result is a family of semi-supervised ensemble aggregation algorithms which are as efficient as linear learning by convex optimization, but are minimax optimal without any relaxations. Their decision rules take a form familiar in decision theory -- applying sigmoid functions to a notion of ensemble margin -- without the assumptions typically made in margin-based learning.